Abstract
Introduction: Sickle cell disease (SCD) is the most common inherited clinically relevant blood disorder resulting in several complications including organ dysfunction, fatigue and pain. These issues contribute to high acute care utilization and reduced quality of life. To better understand SCD, several registries have been developed but are not always maintained due to insufficient funding. Further, these registries do not use common data elements, are poorly coordinated and fall short of the comprehensive insights provided by well-structured, longitudinal registries used in other diseases. To overcome the lack of a single robust clinical registry in SCD, we implemented a privacy preserving technique for linking individuals across 3 of the larger multi-site SCD registries using a hashed identifier token. This proof-of-concept project was done to enhance our understanding of SCD to demonstrate how combining information across the registries is more accurate than using one data source alone. The findings also highlight the errors seen when relying on a single dataset when measuring acute care utilization.
Methods: The study was conducted at the University of Alabama Birmingham (UAB). We have IRB approval for three SCD data collections: the Alabama SCD surveillance database (part of the CDC Sickle Cell Data Collection Project) called ALSTATE, the American Society of Hematology Research Collaborative (ASH RC) Data Hub for people with SCD seen at a UAB hospital, and the Globin Research Network for Data and Discovery (GRNDaD) registry. For each database, we internally create a set of identity tokens based on a selected set of identifiers and then used hashing (SHA-256) to create a hashed token to enable data linkage without compromising participant privacy. First, we proved that the hashed token could be used to link individuals across all 3 registries. Second, we quantified the acute care use within each dataset. As people with SCD may obtain care for acute pain crisis in the emergency department (ED), hospital, or a day hospital (DH), each data collection system may include different information from different sources. ALSTATE includes ED and hospital use from multiple hospitals, the ASH RC Data Hub only includes ED and hospital data from UAB hospitals and GRNDaD includes acute care data from DH, ED, and hospitals if that data is available within the patient's chart.
Results: The first part of the project included 8026 records across the 3 registries. With the hashed identifier token, we identified 1080 unique individuals with records in at least two data sets. There were 340 individuals (93% of the 365 individuals enrolled in GRNDaD) were successfully matched and linked across all data sets.
Second, we evaluated the information on acute care use in each database from a sub-set of 253 individuals that had acute care data from 2023 available for comparison. Within this cohort, we identified if people had zero, 1-3, 4-6 or >6 acute care visits in 2023. ALSTATE identified 69 people with zero acute care visits while the ASH RC data hub identified 118 people with no acute care visits but GRNDaD showed that only 77 (30.4% of 253) actually had no acute care visits. Similarly, ALSTATE showed that 64 people had >6 acute care visits, while the ASH RC data hub identified 21 people and GRNDaD showed only 36 people had >6 acute care visits.
Conclusion: This is the first SCD project to implement a privacy preserving technique for linking individuals across multiple registries. This approach can leverage unique data elements from each source while ensuring that PHI is protected. Each data set provides key information on people with SCD. However, in isolation, the data from each data set may be misleading as shown with the acute care use information here. GRNDaD is the only source that includes DH visits for acute pain crisis as neither ALSTATE nor the ASH RC data hub can distinguish these visits from outpatient clinic. At the same time, ALSTATE includes data from multiple hospitals providing a wider view. However, by linking registries, we can triangulate the information for the most accurate data compared to using any of the data sets alone. Enhancing data interoperability is essential to increase our longitudinal understanding of SCD, improve our ability to identify and personalize treatments for affected individuals and ensure we can bring more treatments to the forefront for this at-risk population.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal